48 research outputs found

    Key Users and Box Office Analysis in an Interest Based Virtual Community

    Get PDF
    In recent years, with the growth of the Internet technology, the users of virtual community not only play the role of the information receiver but also a very important one to provide information. However, there is large amount of information aggregated daily and therefore information overloading has become a very serious problem. Under this situation, how to find information efficiently is also a very important issue. In this paper, we believe users in a virtual community may affect each other, especially those with high influence who have been called as Key Users. Therefore, we observe the biggest virtual community of movies on the Internet which is named IMDb (The Internet Movie Database). An architecture also has been proposed that combines Social Networks Analysis and the features of IMDb to discover those users who have high influence in the virtual community. We collected 17 months (January 2010 to May 2011) from IMDb including 17 366 users and 243 074 reviews. By applying the method we proposed, there are about 22 key users and 111 reviews were discovered. We also use the box office of the movies to justify our results

    Privacy Preserving Utility Mining: A Survey

    Full text link
    In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page

    [[alternative]]協同推薦關聯規則之一次掃瞄清除

    No full text
    [[abstract]]For a given recommended item, a collaborative recommendation association rule set is the smallest association rule set that makes the same recommendation as the entire association rule set by confidence priority. In this work, we propose an efficient one-scan sanitization algorithm to hide collaborative recommendation association rules. To hide association rules, previously proposed algorithms based on Apriori approach usually require multiple scanning of database to calculate the supports of the large itemsets. We propose here using a pattern-inversion tree to store related information so that only one scan of database is required. Numerical experiments show that the proposed algorithm out performs previous algorithms, with similar side effects.給定一個推薦項目,一個協同推薦關聯規則集是一組根據信賴值排序之最小關聯規則集,且具備相同之推薦結果。在本文中,我們提出一個有效率之一次掃瞄清除演算法以隱藏協同推薦關聯規則集。一般隱藏關聯規則之演算法皆須做多次之資料庫掃瞄。我們則提出一個利用式樣反轉樹(pattern-inverse tree)儲存相關資料並且只須掃瞄資料庫一次之演算法。數值實驗顯示我們所提之演算法比其他方法更有效率且具有類似之副作用

    Probabilistic Analysis of Information Center Insecurity

    No full text
    [[abstract]]Information security has become a top priority for many organizations due to a growing number of computer threats. Modeling information system security has been studied extensively in recent years and many techniques have been proposed. However, most techniques for analyzing system insecurity and vulnerabilities originate their modeling or analysis from the source of attack, which is realistic but complex. In this work, we take a different approach by starting the analysis and calculation of insecurity from the resources to be protected. We propose a simple model and an algorithm to efficiently calculate the probability of insecurity for each resource in an information center when a single type of threat exists. Unlike the previous work in [10] which assumes that once an attacker compromises a resource, the attack process ends, we allow the attack to be continued after a resource is compromised. Numerical simulations showing system insecurity using some common information center topologies are presented. When properly combined with risk management strategy, the proposed technique can effectively calculate the optimal security investment for information centers

    Modeling Optimal Security Investment of Information Centers

    No full text
    [[abstract]]We present here two algorithms that calculate the probability of threat and the optimal investment for information center security respectively. Based on the insecurity flow model [10] of analyzing security violations, we first model information center topology using two basic components, namely resources and filters. Four basic patterns are then identified as the building blocks for the first algorithm to calculate the accumulative probability of realized threat on each resource. To calculate the optimal investment, a risk-based algorithm that maximizes the total expected net benefit is then proposed. Analyses and numerical simulations for the two algorithms are shown on some common information center topologies to demonstrate the effectiveness of the approach. The technique proposed here can be used to facilitate the analysis and design of more secured information centers

    Efficient Hiding of Collaborative Recommendation Association Rules with Updates

    No full text
    [[abstract]]We propose here an efficient data mining algorithm to hide collaborative recommendation association rules when the database is updated, i.e., when a new data set is added to the original database. For a given predicted item, a collaborative recommendation association rule set [17] is the smallest association rule set that makes the same recommendation as the entire association rule set by confidence priority. Several approaches to hide collaborative recommendation association rules from static databases have been proposed [17, 18]. However, frequent updates to the database may require repeated sanitizations of original database and added data sets. The efforts of previous sanitization are not utilized in these approaches. In this work, we propose using pattern inversion tree to store the added data set in one database scan. It is then sanitized and merged to the original sanitized database. Numerical experiments show that the proposed approach out performs the direct sanitization approach on original and added data sets, with similar side effects

    HIDING PREDICTIVE ASSOCIATION RULES ON HORIZONTALLY DISTRIBUTED DATA

    No full text
    [[abstract]]In this work, we propose two approaches of hiding predictive association rules where the data sets are horizontally distributed and owned by collaborative but non-trusting parties. In particular, algorithms to hide the Collaborative Recommendation Association Rules (CRAR) and to merge the (sanitized) data sets are introduced. Performance and various side effects of the proposed approaches are analyzed numerically. Comparisons of non-trusting and trusting third-party approach are reported. Numerical results show that the non-trusting third-party approach has better processing time, with similar side effects to the trusting third-party approach
    corecore